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Abstract From the analysis of sizes of approximately 130 small icosahedral viruses we 
find that there is a typical structural capsid protein, having a mean diameter of 5 nm and a 
mean thickness of 3 nm, with more than two thirds of the analyzed capsid proteins having 
thicknesses between 2 nm and 4 nm. To investigate whether, in addition to the fairly con- 
served geometry, capsid proteins show similarities in the way they interact with one another, 
we examined the shapes of the capsids in detail. We classified them numerically according 
to their similarity to sphere and icosahedron and an interpolating set of shapes in between, 
all of them obtained from the theory of elasticity of shells. In order to make a unique and 
straightforward connection between an idealized, numerically calculated shape of an elas- 
tic shell and a capsid, we devised a special shape fitting procedure, the outcome of which 
is the idealized elastic shape fitting the capsid best. Using such a procedure we performed 
statistical analysis of a series of virus shapes and we found similarities between the cap- 
sid elastic properties of even very different viruses. As we explain in the paper, there are 
both structural and functional reasons for the convergence of protein sizes and capsid elastic 
properties. Our work presents a specific quantitative scheme to estimate relatedness between 
different proteins based on the details of the (quaternary) shape they form (capsid). As such, 
it may provide an information complementary to the one obtained from the studies of other 
types of protein similarity, such as the overall composition of structural elements, topology 
of the folded protein backbone, and sequence similarity. 
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1 Introduction 

Viruses are the most abundant source of DNA and proteins in Earth's oceans that contain on 
the order of 10 30 virions [ 1 j. Yet their status as living entities is often called into question as 
they do not conform to the "self-reproduction with variations" standard |2|. It appears that 
much of their features can be understood in terms of thermodynamic equilibrium physics (3] 
HO), especially when they are "dormant", i.e. outside the cells which they infect, where they 
in fact turn into little more than very complicated macromolecules with a "life cycle" |6j- 
The fact that many features of their life cycle can be understood within the equilibrium 
framework sets them apart from the rest of the living biological systems. 

The lack of self-reproduction with variations that we associate with (present-day) life 
could also mean that viruses predate precellular life |7|, which is partially corroborated 
by the fact that there seems to be no living system that is immune to viruses, including 
viruses themselves [8|. The long lasting debate about the position of viruses in the great 
divide between the living and nonliving has been only intensified in recent years with the 
discovery of viruses with huge genomes that encode proteins which allow for mechanisms 
that we do associate with life |9, 10, 11 1. These gigantic viruses are more complex than some 
bacteria which even further obscures the question of their status and their origin. 

The usual approach to trace the origins of a virus is to analyze the virus genome and 
compare it to other strains and viruses with the hope of uncovering evolutionary relatedness 
[ 1 2 1 . This can, however, be a futile endeavor as viruses mutate quickly, especially when 
their genome is RNA based 11131 . The phylogenetic approach thus often reduces to com- 
parison of highly divergent sequences, which in effect prohibits the accurate extraction of 
the evolutionary information and the determination of the relatedness of different viruses. 
Viruses are thus typically classified only in terms of the type of genome that they contain 
(double-stranded (ds) or single-stranded (ss) DNA, dsRNA, plus- and minus-sense ssRNA, 
and reverse transcribing viruses) or whether they wrap themselves in a piece of cellular 
membrane or not (enveloped and non-enveloped viruses). 

The virus phenotypes, on the other hand, are often strikingly similar. A most obvi- 
ous similarity is between the protein shells (capsids) they form to protect and pack their 
genomes. A large number of viruses have an icosahedral capsid that consists of many copies 
of a single or a few very similar proteins arranged in a highly symmetrical manner that 
can be defined in precise mathematical term^] Thus, even for viruses which are highly di- 
verged and apparently unrelated (e.g. some may infect plant and other animal cells), there 
are conserved features in their phenotype, i.e. in the proteins. A viable mode of investigation 
of virus evolution and origin may thus be through the analysis of capsid protein structure 
and function, which appear to be fairly conserved, even when the sequences coding for the 
proteins are very divergent. 

The icosahedral nature of the capsid requires proteins to assemble in precise relations 
to their neighbors, and disruption of some key aspects of protein structure, such as spatial 
distribution of hydrophobic/hydrophilic patches and/or charge, may completely block the 
assembly of a virus from its constituents. This is an evolutionary dead end for a virus, so 
there are obviously some physical constraints obeyed and encoded in the viral genome, 
which must be preserved in the evolution of a virus. The aim of this paper is to accentuate 



1 This is the basis of the Caspar- Klug classification scheme 1141 that classifies different capsid icosahedra 
in terms of the triangulation number T, which is, roughly, a way to divide the icosahedron in similar, nearly 
equivalent parts that represent individual proteins. 
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this information through the analysis of shape and size distribution of different icosahedral 
viruses. 

As obvious as it may seem, this is by no means a straightforward and clearly defined 
task. After all, what does it actually mean to analyze the similarities between the shapes of 
different viruses and what sort of an information one gains in the process? We shall seek the 
answer to this question in the nonlinear theory of elastic shells [ 15] by carefully comparing a 
large amount of different viral shape information to numerical predictions, and then extract- 
ing a sequence of (Foppl-von Karman) numbers pertaining to the sequence of shapes. When 
dealing with real viruses, one is confronted with an experimentally determined structure that 
contains spatial coordinates of (ideally) all the atoms that compose the virus. This is a huge 
amount of data containing a detailed description of the virus surface. Some of these details 
are of course important for virus attachment (receptor geometry), but there are also some 
generalities that are expected to be a consequence of the elasticity of the protein shell. Our 
work presents the attempt to identify the features of the shape related to elastic properties of 
protein-protein interactions in the capsid, and to numerically quantify these properties. 



2 Analysis 

2.1 Strucural Dataset Used 

In our analysis we have used approximately 130 capsid entries deposited in VIPERdb [ 16 1, 
obtained from X-ray scattering experiments or cryo-electron microscopy. The capsid trian- 
gulation numbers range from T = 1 to T = The number of viruses with triangulation 
number T > 1, used in determining the elastic parameters, is approximately 100, with 29 of 
them having a ?" -number greater than T = 3. 

To the polyomaviruses and papillomaviruses we assign a triangulation number of T = 6 
instead of the usually used T = 1 as, they are composed of 360 copies of a protein, and should 
likely be considered as dodecahedral, not icosahedral structures 1171 . We do this in order for 
the number of proteins in a capsid to have a clear correspondence with its triangulation 
number, enabling us a more consistent analysis. 



2.2 Calculation of Best-fitting Prototype Shape and Effective FvK Number of a Capsid 

To calculate the root-mean-square deviation between a given capsid and a numerically pre- 
dicted "prototype" shape, we first rotate the prototype shape so that the five-fold symmetry 
axes of both shapes coincide. For a rotated prototype shape given by the vertices v,- and the 
amino acid positions of the capsid tj we calculate the RMSD as 



RMSD(v,r) 



^Ell^-v/ll 2 . a) 

iV 7=1 



where \j is the approximation for the closest point on the triangulated prototype surface to 
the amino acid position Tj. The closest point on the prototype surface to a given amino acid 

2 The triangulation numbers denoted with p (pseudo) do not completely conform to the Caspar-Klug 
principle of quasi-equivalence since the basic unit is composed of different (but morphologically similar) 
proteins. 
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will lie on the radial vector connecting the origin and the amino acid, piercing the prototype 
surface. Since we do not have the exact surface shape but only its triangulated mesh [181, 
we take for the point the point where the radial vector pierces the tangential plane of 
the prototype vertex v, nearest to the amino acid tj. This gives a good approximation to 
the true distance from the surface, provided the meshing is fine enough. We have also tried 
several different distance measures, and found they influence the values of RMSD (a worse 
approximation yielding bigger values), but not the position of the minimum FvK number 
for a given capsid. To minimize the computational time we do the calculation for 1 / 60th of 
a virus, since the rest of the amino acid positions in the capsid can be generated by applying 
the rotational matrices of the icosahedral symmetry group, and thus do not influence the 
final result. 



3 Results and Discussion 

Before proceeding to the shape analysis of a virus, it is important for our goal to perform a 
simple analysis of the average virus size. This is an information pertinent to the analysis of 
elastic properties of the capsid, especially the thickness of proteins compared to the average 
capsid size, as will be shown further on in the paper. 

3.1 Effective Size of Virus Proteins 

The sizes of viruses vary across a large interval of mean radii (~ 10-200 nm) and if there is 
something like a "typical" capsid (structural) protein, having some "typical" size, the larger 
viruses should contain more copies of it. This means that larger viruses should have a larger 
(triangulation) r-numbeij^] instead of being built of a smaller number of copies of a bigger 
protein and having a r -number comparable to smaller viruses. This proposition has been 
confirmed by Rossmann and Erickson 1191 on a dataset of viruses much smaller from the 
one we use, so it is of interest to reexamine this issue and update the analysis. To this end, 
we plot the mean radii of the capsids versus their triangulation numbers in Fig.[T] 

If the idea of a viral protein of some prototypical size makes sense, then the average 
capsid radius should be distributed according to 

which was obtained simply from equating the area of an approximately spherical virus, 

—2 

AtzR , with the total area of assembled capsid proteins, 60TA p , where A p is the average area 
per protein. The fit of the form 

R(T) = axT b (3) 

that we performed on the data shows that the idea indeed makes sense, as b = 0.41 ± 0.02 is 
quite close to the expected 0.5. The average area of a prototypical capsid protein is obtained 
from the coefficient a and is found to be A p w 20 nm 2 . A similar result was obtained by 
Rossmann and Erickson 1 19 1. They find that the ratio of the virus radius and a square root of 
its T-number is approximately conserved, R/yT ~ 10 nm, which is in very good agreement 
with our results. 



3 There are 607 proteins in a virus of a given 7-number. 
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Fig. 1 Calculated mean radii of the capsids versus their triangulation numbers. The dashed line shows the 
least squares fit of the data to the function R = ax T b , with a = 9.6 ± 0.4 nm, and b = 0.41 ± 0.02. The data 
consist of approximately 130 viruses, with circles denoting viruses with single-stranded genome, squares 
viruses with double-stranded genome, diamonds bacteriophages, and triangles T = p3 ssRNA viruses. 



Here we mention that the choice of the viruses included in the fit can have a certain 
effect on the end results, as already noted elsewhere 1201 . In Table[T]we compile the results 
of the fit for various subsets of the viruses included in our analysis. If we perform the fit on 
viruses with ss genome or with ds genome only, we obtain similar results as when the entire 
database is included. Bacteriophages, on the other hand, do not conform to the relationship 
in Eq. (H that well, and excluding them from the fit gives a slope of almost T l l 2 . 



Table 1 Fitting coefficients for the dependence of capsid mean radius on its triangulation number, R(T) = 
axT b . The results of the fit are shown for the entire dataset, and for subsets selected on the basis of the type 
of the viral genome contained in the capsid. 



dataset 


a [nm] 


b[] 


entire 


9.6±0.4 


0.41 ±0.02 


ss only 


9.1±0.6 


0.43 ±0.06 


ds only 


10.1 ±0.7 


0.45 ±0.03 


phage only 


1 1 ± 1 


0.33 ±0.05 


ss and ds only 


8.75 ±0.3 


0.49 ±0.02 



We next investigate the average thickness of capsid proteins. In order to define the av- 
erage capsid protein thickness we have analyzed the radial mass distribution of each capsid, 
which peaks around the capsid mean radius R. We have defined the average capsid thick- 
ness 8 as the full-width-at-half-maximum (FWHM) of this distribution (see Ref. 1211 for 
details). Note that our procedure implicitly assumes that the capsids are fairly spherical (as 
they in fact are for the present purpose). The width may be artificially enlarged due to the 
nonsphericity of the capsid, but this effect yields corrections of the magnitude which cannot 
falsify the similarities of thicknesses among different capsid proteins that we obtain. 

Our results for average capsid protein thicknesses are shown in Fig. [2] The fit of the form 
8 (T) = a x T h gives a small positive exponent for T dependence with a = 2.9 ± 0.2 nm and 
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b = 0. 12± 0.05. We thus find that the thicknesses of capsid proteins are almost constant and 
typically in the range of 2-4 nm, with thicker proteins being relatively rare, and intriguingly 
almost equally distributed in viruses with ss and ds genomes. 
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Fig. 2 Distribution of capsid thicknesses. Top panel: Mean thicknesses of the capsids versus their triangula- 
tion numbers. The dashed line shows the least squares fit of the data of the form S = axT b , with a = 2.9 ± 0.2 
nm, and b = 0. 12 ± 0.05. The symbols have the same meaning as in Fig.[T] Bottom panel: Probability distri- 
bution of capsid thicknesses. Note that the about two thirds of the capsids in the set have thicknesses between 
2 nm and 4 nm. 



Some of the thicker viruses fall into the T = 1 ds genome category, and a closer inspec- 
tion suggests more pronounced protrusions around five-fold axes than in the T = 1 viruses 
with the ss genome, which might explain the increased thickness. Some of the other outliers 
are two caliciviruses, Sindbis virus, Semliki Forest virus, and Nudaurelia capensis CO virus; 
no obvious characteristic is shared among these viruses. 

Taken together, our analysis shows that the concept of a typical capsid protein is indeed 
viable - the typical capsid protein is prism-like, about 3 nm thick and has an average diam- 
eter of 5 nm. The molecular weight pertaining to these protein dimensions can be estimated 
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well by niprot — 2.7 x 10 amu (Fig.|3|l. What interests us next are the elastic properties of 
the two-dimensional sheet made of such prototype proteins. 
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Fig. 3 Probability distribution of average protein mass of capsids. The average protein mass is obtained by 
averaging over T-number of constituent proteins in a capsid. Majority of the viruses have the average mass 
of their proteins around m prol ~ 2.7 x 10 4 amu. 




3.2 Elasticity of Infinitely Thin Icosa(delta)hedral Shells: Foppl-von Karman Number 

The virus capsid is a shell - as we have shown above, it is a structure formed by a layer of 
protein material "wrapped" so as to make a closed structure of icosahedral symmetry. The 
fact that the symmetry of the shell is icosahedral restricts the possible shapes of the capsid. 
Depending on the elastic properties of the protein capsid, i.e. the energetics of inter-protein 
contacts, such a shell may be more sphere-like, or more icosahedron-like. The sphere and 
the icosahedron are in fact two limiting shapes of the infinitely thin, continuum elastic icosa- 
hedrally symmetric shells with all the other allowed cases lying somewhere in between the 
two extremes. This opens the possibility to study the design and conserved elastic features 
of the virus proteins by analyzing the details of the shape of the icosahedral shell (capsid) 
they form. 

The details of the elastic theory of icosa(delta)hedral shells without the spontaneous 
curvature have been elaborated in Refs. 1 15 1 and [ 18 1. We shall present here only the most 
important issues relevant to our aim. It has been shown that the shape of a continuum icosa- 
hedral shell depends on a single parameter. This parameter is termed the Foppl-von Karman 
number (y; FvK) in Ref. 1151 , and it depends on the elastic properties of (formally infinitely 
thin) two-dimensional sheets made of virus proteins. There are two elastic parameters char- 
acterizing the elastic response of such sheets: the two-dimensional Young's modulus Y 
which specifies the response of the sheet to stretching, and the bending rigidity fc which 
quantifies the energetics of the sheet bending. The FvK number of a thin shell with mean 
radius R is a particular dimensionless combination of the elastic parameters of the shell and 
its radius, defined as 
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3.3 Fitting the Shapes: Determining Effective FvK Number of Real Viruses 

We have approached the problem of extracting the elastic parameters of the proteins from the 
shape of the capsid by assigning an effective FvK number to a real capsid whose shape (i.e. 
the positions of atoms) has been determined experimentally. We have first generated sixty 
"prototype" shapes of different FvK numbers, i.e. the ideal shapes of the icosadeltahedral 
shells obtained for continuum shell material (formally infinite T-number; we have used 
T = 625 [18 1). These shapes, 30 of which are shown in Fig. [4] were then compared to the 
real capsid and the one that fitted capsid the best was found. The FvK number of the best- 
fitting prototype shape was proclaimed to be the effective FvK number of the virus. 




Fig. 4 Prototype shapes. The FvK number increases from left to right and from top to bottom. The range of 
FvK numbers goes from y = 10 _1 (almost a sphere) to y= 10 7 (almost an icosahedron), each shape in the 
sequence having FvK number larger from the previous one by a factor of 1.77. The colors show the relative 
elastic energy contained in the parts of the shape surface, with yellow-colored regions having largest energy, 
magenta-colored ones having smallest energy, and blue-colored regions being in between (see Refs. 1151 
and 1 18 1 for details). 

In order to conduct this scheme, one first needs to define the procedure for comparing 
shapes. We did this by calculating the root-mean-square deviation (RMSD) between a capsid 
and all 60 of the prototype shapes. By doing this we obtained the RMSD(y) dependence for 
our set of analyzed capsids. The minimum of this curve for each capsid is the effective 
FvK number associated with a given virus. An example curve is shown in Fig. [5] for the 
bacteriophage P22 procapsid (yielding a FvK number of y = 406), and Fig. [6] shows a real 
virus shape of the bacteriophage PM2 and the corresponding best-fitting prototype with 
7 = 720. Details of the procedure are given in the Analysis section. 

A variant of the procedure described was most likely used in the analysis of L-A and 
HK97 viruses in Ref. 1151 (the authors do not specify the details). Here we have further 
developed the procedure into a consistent fitting scheme and applied it to a large set of virus 
shapes as a tool for determination of elastic properties of capsid proteins. It is of interest 
to note here that for L-A and HK97 viruses we obtain values similar to those calculated in 
Ref. d (718 and 1270 vs. 547 and 1480 obtained in Ref. Q3)). 

The above procedure yields only an estimate of the elastic parameters of the capsids 
for four reasons: (i) The overall shape of the capsid may sometimes be importantly defined 
by the protein features not related to protein-protein interactions and the shell elasticity. 
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Fig. 5 Example determination of the effective FvK number of a virus. The virus in question is bacteriophage 
P22 procapsid (PDB ID 2xyy), and the effective FvK number assigned to it is the FvK number of the prototype 
shape with the minimum RMSD compared to the capsid. The procedure, detailed in the text, yields 7 = 406 
in this case. 




Fig. 6 Real capsid and the best-fitting prototype shape. Left: The experimentally determined shape of bac- 
teriophage PM2 (PDB ID 2w0c), an icosahedral virus with triangulation number T = p2l. The shape was 
constructed as a union of spheres of radius 8.4 A, each one representing an amino acid of a virus. Right: The 
best-fit prototype shape with 7 = 720. 



Some capsids for example have spikes, troughs, and other peculiarities related to the special 
properties of the protein subunit. For such capsids, the procedure we propose may be less 
reliable. ( it) The theory was constructed for infinitely thin shells, so it should be applied with 
some reservation to viruses whose thickness is non-negligible with respect to their mean 
radius. (Hi) The theory applies to sheets without spontaneous curvature, (iv) The theory 
applies to continuum shells, which means formally infinite 7" -numbers. 

The point (iv) is particularly important for small viruses (see Ref. [22] for a discussion 
of the buckling of small-J shells). To avoid false signals which may be obtained in the 
described way, we have decided to analyze only the capsids with T > 1. For these capsids, 
the results of the fitting procedure are shown in Fig. [7] For viruses with T = 2 and T = 3 a 
large spread of the ratio of the elastic parameters Y/k is observed, ranging from 0.01 nm~ 2 
to almost 100 nirT 2 . 
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Fig. 7 Ratio of capsid elastic parameters Y/k compared to the capsid mean radius. The values of Y/k 
were obtained from best-fit FvK number, y = R^Y / K. The area between dashed lines shows the interval 
[0.1,2] nm -2 , the values of the FvK numbers of the majority of T > 3 capsids. Symbols have the same mean- 
ing as in Fig.[T] with the exception of the T = p3 ssRNA viruses being shown in the same way as the rest of 
ssRNA viruses (circles). 



When we move to higher T-numbers - viruses with T > 3 - the data show a reason- 
ably converged distribution of Y/k values, with most of the values falling into the inter- 
val [0.1,2] nm~ 2 , indicated by the gray rectangle in Fig. [7] The notable exception though 
is the point located at mean radius of R = 27.85 nm, having the value of the ratio of 
Y/k = 9.1 nm~ 2 ; this point belongs to PM2 bacteriophage. PM2 bacteriophage is a spe- 
cial virus in the sense that below the outer protein shell it contains a mixed lipid-protein 
membrane (i.e. it has a proteinaceous lipid core) (23). However, so does the PRD1 bacterio- 
phage 1 24 1 , whose Y/k ratio is within the typical range of values (0.8-1.5 nm -2 ). The reason 
for the observed large Y/k ratio in PM2 is thus most likely due to its very pronounced spike 
proteins around the five-fold axes which may influence our shape analysis and push its effec- 
tive FvK number towards larger values; the PRD1 bacteriophage, on the other hand, has no 
such prominent features. These two cases again accentuate the care required in interpretation 
of our results. 

The reason for the wide-spread range of the Y/k values for T = 3 viruses and a much 
narrower range of this ratio for viruses with higher J -numbers might be explained, at least 
qualitatively and conceptually, by a transition between a mostly geometric domain and a 
continuum domain as proposed in Ref. 1251 . In the former domain the geometry of the pro- 
tein subunits influences the capsid morphology. After a certain capsid size or J-number the 
shapes are then largely described by continuum elasticity where FvK number is the relevant 
parameter. Due to our limited dataset it is difficult to say where exactly this transition occurs, 
but the Fig. [^suggests that this happens somewhere around R = 20 nm. Intriguingly, a very 
similar finding was reported recently in Ref. 1261 . where the authors report a strong correla- 
tion between 7 and a "degree of buckling" in T = 7 capsids, and lack of such a correlation 
in T = 3 capsids. 

Bending rigidity is related to the energy required to change the angle between two capsid 
proteins in flat contact, while the Young's modulus measures how difficult it is to stretch 
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the two capsid proteins in contact, keeping them flat. Having in mind all the intricacies of 
protein-protein interaction, it is not easy to see that the two quantities ought to be in any 
particular relation, so that they should be, in general, treated as completely independent 
parameters. The fact that the ratio of two independent elastic parameters falls in an interval 
of only order of magnitude wide points again to a possible conservation of this quantity. 

The ratio Y/k can be obtained as a function of thickness of the shell in the case of an 
isotropic elastic material, which the protein shell is obviously not, as the proteins preferably 
bind only in 2D. Nevertheless, if we insist on approximating the proteins as isotropically 
interacting particles, we have that Y/k = 12(1 — v 2 )/8 2 , where 8 is again the shell thickness 
and v the Poisson ratio of the material. Interestingly, for 8 ~ 3 nm, and V ~ 0.3, which is 
typical for many materials, this gives Y/k ~ 1 nm~ 2 , which fits nicely in the range we 
obtained (but also with the values found in Refs. 1151 and 1261 "). suggesting that the elastic 
response is essentially fixed with conservation of 8 (i.e. conservation of protein size we have 
already established). This is further corroborated by the examination of extremal values of 8 
and Y/k in our set. Sindbis virus (PDB ID lld4) has the largest 8 in our dataset (1 1.27 nm), 
but also one of the smallest ratios of Y/k, (0.08 nm -2 ), consistent with a simple prediction 
Y/k °c 8~ 2 . However, when we constructed the quantity Y8 2 /k we found that its spread is 
not significantly smaller than the spread of Y/k values, suggesting that this ratio contains 
more information than simply 8. 

4 Summary and Conclusions 

Using the statistical analysis we have shown that capsid proteins of all viruses are similarly 
sized, prism-like, about 3 nm thick, having an average diameter of 5 nm and an average 
molecular weight of m pro , ~ 2.7 x 10 4 amu. This is by no means a trivial finding. In the 
early days of virus structure research 1271 it has been argued that viruses cannot code for 
large structural proteins as it would require long genomes, and the capsid thus must be 
assembled from many copies of a smaller protein. Although this sounds reasonable one 
should not forget that the quantity of information that can be stored in the capsid scales 

— 3 

as R , presuming the genome is uniformly distributed within the capsid, which is the case 
at least for bacteriophages [28,29]. Were all capsids made by assembling 60 protein units 

(T = 1) of similar thickness, the quantity of information required for coding the capsid 

— 2 

proteins would then scale as R , so the percentage of information required for structural 
proteins would be vanishingly small for large enough viruses, i.e. large enough genomes. 
This, seemingly obvious statement is perhaps even more strengthened by the fact that there 
are huge viruses where the spatial constraints do not seem to be the critical issu^] Yet there 
are apparently no huge icosahedral viruses with small J -numbers, i.e. made of huge capsid 
proteins. The capsid protein size appears thus to be an evolutionary conserved feature. 

An important question is whether the elasticity of the capsid is a property which is under 
evolutionary pressure and thus makes a difference for the functioning of a virus. There is at 
least one type of viruses for which we can be fairly certain that the answer is affirmative - 
the dsDNA bacteriophages 1 29 1 . The bacteriophages pack their dsDNA molecule in the pre- 
formed capsid tightly, building up an effective outward mechanical pressures on the capsid 

4 The analysis of available space and restricted length of the genome is somewhat different in ssRNA 
viruses, where the ssRNA molecule is held within the capsid mostly by electrostatic interactions with the 
capsid proteins. In this case, the quantity of information that can be stored in the capsid scales as R (4), 
so that the production of large proteins always requires the same percentage of the ssRNA information, 
irrespectively of the radius of the virus. 
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up to almost a hundred atmospheres. Such huge pressures would induce large displacements 
in elastically soft capsid material, resulting in eventual rupture of protein contacts formed 
by either through hydrophobic-van der Waals association or electrostatic complementarity. 
Thus, at least for these types of viruses the elastic properties of capsids should be under 
evolutionary pressure and should converge to some functional range. 

This does not exclude the possibility of the conservation of elastic properties in other 
types of viruses, where possibly different types of elastic constraints might be at work. The 
situation of RNA virus capsids is in fact exactly reversed, as the (small) force of the genome 
on the capsid creates an effective inward mechanical pressure, being of electrostatic bridging 
origin [4]. The elastic constraints in this case would play a prominent role not so much in 
the context of structural rigidity of the capsid as in the whole self-assembly and maturation 
process. 

Last but not least, in order to penetrate the cell viruses undergo often complicated 
and multi-stepped paths (i.e. receptor attachment, membrane wrapping, etc.) which include 
many interactions, eventually resulting in effective mechanical forces on the capsid. The ar- 
chitecture of the capsid and mechanical properties of its building blocks must be well suited 
to successfully complete this, maybe the most fragile, part of the viral life-cycle. 

By combining the statistical analysis with the theory of elasticity we have analyzed the 
elastic properties of the virus capsids. Our results suggest a reasonable convergence of the 
elastic properties of the viruses we inspected. To fully evaluate the power of the methods we 
proposed to discriminate among different virus architectures and possibly even lineages, a 
significantly larger virus dataset would need to be analyzed which is at present not possible. 
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